Blog Spam: A Review
نویسنده
چکیده
Blogs are becoming an increasingly popular target for spammers. The existence of multiple vectors for spam injection, the potential of reaching many eyeballs with a single spam, and limited deployment of anti-spam technologies has led to a sustained increase in the volume and sophistication of attacks. This paper reviews the current state of spam in the blogosphere at large and in particular as seen at TypePad, a major hosted blog service. Furthermore the effectiveness of two popular open-source email antispam programs at classifying blog comment spam is evaluated.
منابع مشابه
Blocking Blog Spam with Language Model Disagreement
We present an approach for detecting link spam common in blog comments by comparing the language models used in the blog post, the comment, and pages linked by the comments. In contrast to other link spam filtering approaches, our method requires no training, no hard-coded rule sets, and no knowledge of complete-web connectivity. Preliminary experiments with identification of typical blog spam ...
متن کاملSpam Blog Filtering with Bipartite Graph Clustering and Mutual Detection between Spam Blogs and Words
This paper proposes a mutual detection mechanism between spam blogs and words with bipartite graph clustering for fi ltering spam blogs from updated blog data. Spam blogs are problematic in extracting useful marketing information from the blogosphere; they often appear to be rich sources of information based on individual opinion and social reputation. One characteristic of spam blogs is copied...
متن کاملDetecting Blog Spams using the Vocabulary Size of All Substrings in Their Copies
This paper addresses the problem of detecting blog spams, which are unsolicited messages on blog sites, among blog entries. Unlike a spam mail, a typical blog spam is produced to increase the PageRank for the spammer’s Web sites, and so many copies of the blog spam are necessary and all of them contain URLs of the sites. Therefore the number of the copies, we call it the frequency, seems to be ...
متن کاملAIRWeb 2005 Proceedings
We present an approach for detecting link spam common in blog comments by comparing the language models used in the blog post, the comment, and pages linked by the comments. In contrast to other link spam filtering approaches, our method requires no training, no hard-coded rule sets, and no knowledge of complete-web connectivity. Preliminary experiments with identification of typical blog spam ...
متن کاملBlog Track Open Task: Spam Blog Classification
Spam blogs or Splogs are blogs with either auto-generated or plagiarized content created for the sole purpose of hosting ads, promoting affiliate sites and getting new pages indexed. Splogs now rival generic web spam and e-mail spam, presenting a major problem to analytics on the blogosphere from basic search and indexing, to opinion, community, influence and correlation detection. This open ta...
متن کامل